Access Ordering and Effective Memory Bandwidth

نویسنده

  • Steven A. Moyer
چکیده

High-performance scalar processors are characterized by multiple pipelined functional units that can be initiated simultaneously to exploit instruction level parallelism. For scientific codes, the performance of these processors depends heavily on memory bandwidth. To achieve peak processor rate, data must be supplied to the arithmetic units at the peak aggregate rate of consumption. Access ordering, a loop optimization that reorders non-caching accesses to better utilize memory system resources, is a compiler technology developed in this thesis that addresses the memory bandwidth problem for scalar processors executing scientific codes. For a given computation, memory architecture, and memory device type, an access ordering algorithm determines a well-defined interleaving of vector references that maximizes effective bandwidth. Consequently, analytic models of performance can also be derived. Access ordering is fundamentally different from, though complementary to, both caching and access scheduling techniques that attempt to overlap computation with memory latency. Simulation results demonstrate that for a given computation, access ordering can significantly increase effective memory bandwidth over that achieved by the natural reference sequence. Memory system parameters: word size page size page-hit read cycle time page-hit write cycle time page-miss overhead uniform-access read cycle time uniform-access write cycle time Stream parameters: stream start address (vector accessed) stride of access data size mode of access number of data items referenced per functional iteration (logical streams) number of words accessed per loop iteration (physical streams) MAP notation: access to the next element of stream access from in a given access sequence set of all streams in a given computation access sequence that embodies streams number of streams in number of different vectors referenced by streams in depth of loop unrolling General properties of physical stream : number of data items per word intermix factor w p T p/ r T p/ w T p/ m T u/ r T u/ w v s d m σ ε a i t i a i k k th t i S S ˜ S N S V S b t i γ i θ i viii Properties of physical stream for a sequentially interleaved architecture: number of modules referenced set of modules referenced module stride maximum number of accesses serviced at any module for a given loop iteration Properties of physical stream for a multicopy architecture: number of modules referenced module stride Modeling functions: average number of accesses per page referenced average per iteration page miss count average per …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Access Ordering and Memory-Conscious Cache Utilization

As processor speeds increase relative to memory speeds, memory bandwidth is rapidly becoming the limiting performance factor for many applications. Several approaches to bridging this performance gap have been suggested. This paper examines one approach, access ordering, and pushes its limits to determine bounds on memory performance. We present several access-ordering schemes, and compare thei...

متن کامل

Access Ordering Algorithms for an Interleaved Memory

Superscalar processors are well suited for meeting the demands of scientific computing, given sufficient memory bandwidth. Employing parallel memory modules increases the bandwidth available; however , storage schemes devised to reduce module conflict for vector computers are not suitable for scalar computation. Access ordering is a compilation technique that increases effective bandwidth by re...

متن کامل

Improving Effective Bandwidth for Streams

Processor speeds are increasing so much faster than memory speeds that within a decade processors may spend most of their time waiting for data. The problem is already acute for computations that linearly traverse long streams of vector-like data. Although streaming computations lack the temporal locality of reference that makes caches effective, they have predictable access patterns. Since mos...

متن کامل

Access Order and Memory-Conscious Cache Utilization

As processor speeds increase relative to memory speeds, memory bandwidth is rapidly becoming the limiting performance factor for many applications. Several approaches to bridging this performance gap have been suggested. This paper examines one approach, access ordering, and pushes its limits to determine bounds on memory performance. We present several access-ordering schemes, and compare thei...

متن کامل

Increasing Memory Bandwidth for Vector Computations

Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vector-like algorithms, including the “Grand Challenge” scientific problems. Caching is not the sole solution for these applications due to the poor temporal and spatial locality of their data accesses. Moreover, the nature of memories themselves has changed. Achieving great...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993